Analysis of incorrect POS-tagging in student texts with linguistic errors in German

نویسندگان

چکیده

The electronic learner corpus of student texts in German, the PACT, contains parts-of-speech (POS) tagging. This markup is performed automatically using RFTagger. Since are written by students, they may contain various kinds errors: grammatical, spelling, stylistic, and others. Sentences be formulated incorrectly, without taking into account rules language accepted norms. can affect work programs that process automatic mode, as a result, generate incorrect tagging needs to verified manually. purpose study investigate degree influence errors non-authentic on results part-of-speech Based expert error texts, 11 types were identified tagger quality. For each type error, ten sentences containing an selected from corpus. resulting pool was processed taggers RFTagger TreeTagger. parts speech suggested these compared with determined experts As result comparison, following patterns revealed: mistaken when writing non-declinable form adjective instead declinable; one word separately; absence suffix "-er" possessive adjectives formed geographical names; nouns lowercase letter; verb capital letter. case, article provides analysis forms causes POS-tagging, well differences two taggers. Taking revealed will allow more efficient organization POS-tagging verification German. also useful for developers

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PoS-tagging Italian texts with CORISTagger

This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the ta...

متن کامل

Fine-Grained POS Tagging of German Tweets

This paper presents the first work on POS tagging German Twitter data, showing that despite the noisy and often cryptic nature of the data a fine-grained analysis of POS tags on Twitter microtext is feasible. Our CRF-based tagger achieves an accuracy of around 89% when trained on LDA word clusters, features from an automatically created dictionary and additional out-of-domain training data.

متن کامل

POS Tagging for Historical Texts with Sparse Training Data

This paper presents a method for part-ofspeech tagging of historical data and evaluates it on texts from different corpora of historical German (15th–18th century). Spelling normalization is used to preprocess the texts before applying a POS tagger trained on modern German corpora. Using only 250 manually normalized tokens as training data, the tagging accuracy of a manuscript from the 15th cen...

متن کامل

analysis of power in the network society

اندیشمندان و صاحب نظران علوم اجتماعی بر این باورند که مرحله تازه ای در تاریخ جوامع بشری اغاز شده است. ویژگیهای این جامعه نو را می توان پدیده هایی از جمله اقتصاد اطلاعاتی جهانی ، هندسه متغیر شبکه ای، فرهنگ مجاز واقعی ، توسعه حیرت انگیز فناوری های دیجیتال، خدمات پیوسته و نیز فشردگی زمان و مکان برشمرد. از سوی دیگر قدرت به عنوان موضوع اصلی علم سیاست جایگاه مهمی در روابط انسانی دارد، قدرت و بازتولید...

15 صفحه اول

collocation errors in translations of the holy quran

the present study aims at identifying, classifying and analyzing collocation errors made by translators of the holy quran into english.findings indicated that collocationally the most acceptablt translation was done by ivring but the least appropriate one made by pickthall.

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Nau?nyj rezul?tat

سال: 2022

ISSN: ['2518-1092']

DOI: https://doi.org/10.18413/2313-8912-2022-8-3-0-6